Summary

  • The goal of this project is to use machine learning to predict Ag archiving capacity (or ability) in LECs from naive mice.
  • For a proof of principle analysis, Ag-tracking data for d14 cLECs was used to train a random forest classifier to predict Ag status
  • Using this model we defined a gene program that correlates with Ag status at various timepoints
  • Archiving “competent” cLECs can be predicted in the CHIKV LN scRNA-seq data
  • There is a reduction in archiving-competent cLECs in CHIKV-infected mice and a broad downregulation of the Ag-archiving gene program
  • The central goal for this project is to optimize the model (e.g. expand to other cell types) and use it to assess archiving capacity in samples that did not receive an Ag-tag (e.g. other published datasets). We can then identify perturbations/treatments etc that are predicted to impair archiving.


Ag signal is shown below for broad cell types identified in the CD45- dataset.


Ag signal is shown below for day 14 LEC subsets.


Ag signal is shown below for cLEC subsets for each timepoint.





Classifying Ag-high

Ag-low and -high cells were identified by separately clustering each LEC subset for each sample into two groups based on Ag-score. For the 6wk-3wk sample, the 3wk Ag score is used. Ag-low/high classifications used for the analysis are shown below.


Ag-low and -high cells are shown below for cLECs.


Ag-low and -high cells are shown for d14 LEC subsets


A random forest classifier was trained using data for d14 cLECs. The model was then used to predict Ag-high cells in the other Ag datasets.

The fraction of cells belonging to each predicted Ag group is shown on the left for cLECs from each sample. The fraction of true Ag-low, true Ag-high, and false-positive Ag-high cells (high-pred) is shown on the right.

  • The model is fairly accurate in predicting Ag-high cells in the training and test data (d14 cLECs), but does not perform as well when predicting Ag-low cells, this can be improved with more optimization
  • Since we want to identify gene signatures that are expressed in naive mice and continue to be expressed after Ag levels have fallen, we expect to observe an increasing fraction of false positive Ag-high cells for the later timepoints.


Model accuracy was assessed for each LEC subset. F1 scores are shown for different combinations of testing and training data.

  • Ag-high cLECs, collecting, and fLECs are easiest to predict. All models show high F1 scores when tested using these LEC subsets.
  • Ag-high Ptx3 LECs and BECs are most difficult to accurately predict. This is expected since these cell populations have the lowest Ag signal and the fewest Ag-high cells.
  • The F1 score is not always highest when the models are tested using the training cell type. This is not necessarily surprising since the models were selected using several metrics in addition to the F1 score.


F1 scores are shown for the d14 timepoint using testing and training data from the same LEC subset.




Ag modules

Expression of the top upregulated (top) and downregulated (bottom) gene modules that are most predictive of Ag signal are shown below.

  • There is a notable correlation between the expression of these genes and the Ag class
  • False positive Ag high cells (high-pred) show an intermediate level of expression that falls roughly between the true Ag-low and true Ag-high cells.
  • The false positive Ag high cells are potentially cells that are archiving-competent but have now lost/released most Ag at the later timepoints


Expression of the Ag-high and -low gene modules is shown below for 14, 21, and 42 day timepoints.


The cLEC Ag-high gene module is shown below.


UMAP projections show Ag-high module expression (top), true Ag-low vs true Ag-high (middle), and false-positive Ag-high (high-pred) vs true Ag-high (bottom).

  • False positive Ag-high cells (high-pred) show strong overlap with true Ag-high cells

cLEC


Collecting


fLEC


Ptx3_LEC


Expression of Ag-high and Ag-low gene modules is compared for the 6wk, 3wk, and 6wk-3wk samples. Module expression is shown for Ag double-positive cells from the 6wk-3wk mouse (double-high), cells positive for a single Ag-tag (single-high), and Ag-high cells from the 6wk or 3wk mouse (6wk-high, 3wk-high). All Ag-low cells are plotted as a single group.

  • When compared to the 6wk and 3wk mice, double positive Ag-high cLECs show slightly higher expression of the Ag-high module.
  • This effect requires more investigation, but suggests that double-positive cells are better equipped to archive antigen.




Ag features

Mean expression in cLECs is shown on the left for genes from the Ag-high module for true Ag-low, true Ag-high, and false positive Ag-high (predicted, high-pred) cells. Expression is shown on the right for select top features.

Points show median expression, grey bars show interquartile range, dotted line shows the trend, and arrows indicate the gene is significantly up or down regulated when compared to Ag-low cells.

Ag-high gene module is shown below for cLECs.


Collecting-high


Ptx3_LEC-high


cLEC-high


fLEC-high


Collecting-low


Ptx3_LEC-low


cLEC-low


fLEC-low





Ag archiving in CHIKV

The fraction of cells predicted to be Ag-high (i.e. archiving competent) is shown below for each biological replicate. p-values < 0.05 are shown.

  • There is a reduction in archiving-competent cells in CHIKV-infected LN samples


The fraction of cells predicted to be Ag-high is shown below for 24 hpi cLECs.


Expression of the Ag-high (top) and Ag-low (bottom) gene modules is shown below for each predicted Ag class for the 24 hpi timepoint.

  • Cells predicted to be archiving-competent show upregulation of the Ag-high gene module
  • The Ag-low gene module shows similar expression between samples


Expression of Ag modules is shown for predicted Ag-low and Ag-high cLECs.


Expression of Ag-high gene module is shown below for mock vs CHIKV.


Mean expression is shown for genes from the Ag-high module for mock- and CHIKV-infected mice from the 24 hpi timepoint.

  • CHIKV-infected mice broadly downregulate Ag-high modules


Collecting




cLEC




fLEC




Session info

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
##  [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
##  [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
##  [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
##  [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
## [11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8
## 
## time zone: America/Denver
## tzcode source: system (glibc)
## 
## attached base packages:
##  [1] stats4    tools     grid      stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ggtree_3.8.2          GOSemSim_2.26.1       org.Mm.eg.db_3.17.0  
##  [4] AnnotationDbi_1.62.2  IRanges_2.34.1        S4Vectors_0.38.2     
##  [7] Biobase_2.60.0        BiocGenerics_0.46.0   msigdbr_7.5.1        
## [10] enrichplot_1.20.3     clusterProfiler_4.8.3 caret_6.0-94         
## [13] lattice_0.21-8        furrr_0.3.1           future_1.33.0        
## [16] ranger_0.15.1         rsample_1.2.0         harmony_1.0.3        
## [19] biomaRt_2.56.1        openxlsx_4.2.5.2      MetBrewer_0.2.0      
## [22] rdrop2_0.8.2.1        ggtext_0.1.2          ggtrace_0.2.0        
## [25] qs_0.25.5             vroom_1.6.3           M3Drop_1.26.0        
## [28] numDeriv_2016.8-1.1   djvdj_0.1.0           gtools_3.9.4         
## [31] clustifyrdata_1.1.0   here_1.0.1            presto_1.0.0         
## [34] data.table_1.14.8     Rcpp_1.0.11           devtools_2.4.5       
## [37] usethis_2.2.2         ComplexHeatmap_2.16.0 patchwork_1.1.3      
## [40] scales_1.2.1          boot_1.3-28.1         clustifyr_1.12.0     
## [43] mixtools_2.0.0        broom_1.0.5           colorblindr_0.1.0    
## [46] colorspace_2.1-0      xlsx_0.6.5            RColorBrewer_1.1-3   
## [49] ggrepel_0.9.3         cowplot_1.1.1         knitr_1.44           
## [52] gprofiler2_0.2.2      SeuratObject_4.1.4    Seurat_4.4.0         
## [55] ggforce_0.4.1         ggbeeswarm_0.7.2      lubridate_1.9.3      
## [58] forcats_1.0.0         stringr_1.5.0         dplyr_1.1.3          
## [61] purrr_1.0.2           readr_2.1.4           tidyr_1.3.0          
## [64] tibble_3.2.1          ggplot2_3.4.3         tidyverse_2.0.0      
## 
## loaded via a namespace (and not attached):
##   [1] igraph_1.5.1                ica_1.0-3                  
##   [3] plotly_4.10.2               Formula_1.2-5              
##   [5] zlibbioc_1.46.0             tidyselect_1.2.0           
##   [7] bit_4.0.5                   doParallel_1.0.17          
##   [9] clue_0.3-65                 rjson_0.2.21               
##  [11] blob_1.2.4                  urlchecker_1.0.1           
##  [13] S4Arrays_1.0.6              parallel_4.3.1             
##  [15] png_0.1-8                   cli_3.6.1                  
##  [17] ggplotify_0.1.2             goftest_1.2-3              
##  [19] kernlab_0.9-32              densEstBayes_1.0-2.2       
##  [21] uwot_0.1.16                 shadowtext_0.1.2           
##  [23] curl_5.0.2                  mime_0.12                  
##  [25] evaluate_0.22               tidytree_0.4.5             
##  [27] leiden_0.4.3                stringi_1.7.12             
##  [29] pROC_1.18.4                 backports_1.4.1            
##  [31] XML_3.99-0.14               httpuv_1.6.11              
##  [33] magrittr_2.0.3              rappdirs_0.3.3             
##  [35] splines_4.3.1               prodlim_2023.08.28         
##  [37] RApiSerialize_0.1.2         ggraph_2.1.0               
##  [39] sctransform_0.4.0           sessioninfo_1.2.2          
##  [41] DBI_1.1.3                   jquerylib_0.1.4            
##  [43] withr_2.5.1                 class_7.3-22               
##  [45] rprojroot_2.0.3             lmtest_0.9-40              
##  [47] bdsmatrix_1.3-6             tidygraph_1.2.3            
##  [49] htmlwidgets_1.6.2           fs_1.6.3                   
##  [51] SingleCellExperiment_1.22.0 segmented_1.6-4            
##  [53] labeling_0.4.3              MatrixGenerics_1.12.3      
##  [55] reticulate_1.32.0           zoo_1.8-12                 
##  [57] XVector_0.40.0              timechange_0.2.0           
##  [59] foreach_1.5.2               fansi_1.0.4                
##  [61] caTools_1.18.2              timeDate_4022.108          
##  [63] irlba_2.3.5.1               gridGraphics_0.5-1         
##  [65] ellipsis_0.3.2              lazyeval_0.2.2             
##  [67] yaml_2.3.7                  survival_3.5-5             
##  [69] scattermore_1.2             crayon_1.5.2               
##  [71] RcppAnnoy_0.0.21            progressr_0.14.0           
##  [73] tweenr_2.0.2                later_1.3.1                
##  [75] ggridges_0.5.4              codetools_0.2-19           
##  [77] base64enc_0.1-3             GlobalOptions_0.1.2        
##  [79] profvis_0.3.8               KEGGREST_1.40.1            
##  [81] bbmle_1.0.25                Rtsne_0.16                 
##  [83] shape_1.4.6                 filelock_1.0.2             
##  [85] foreign_0.8-84              pkgconfig_2.0.3            
##  [87] xml2_1.3.5                  GenomicRanges_1.52.0       
##  [89] aplot_0.2.2                 spatstat.sparse_3.0-2      
##  [91] ape_5.7-1                   viridisLite_0.4.2          
##  [93] xtable_1.8-4                plyr_1.8.8                 
##  [95] httr_1.4.7                  globals_0.16.2             
##  [97] hardhat_1.3.0               pkgbuild_1.4.2             
##  [99] beeswarm_0.4.0              htmlTable_2.4.1            
## [101] checkmate_2.2.0             nlme_3.1-162               
## [103] loo_2.6.0                   HDO.db_0.99.1              
## [105] dbplyr_2.3.4                digest_0.6.33              
## [107] Matrix_1.6-1.1              farver_2.1.1               
## [109] tzdb_0.4.0                  reshape2_1.4.4             
## [111] ModelMetrics_1.2.2.2        yulab.utils_0.1.0          
## [113] viridis_0.6.4               rpart_4.1.19               
## [115] glue_1.6.2                  cachem_1.0.8               
## [117] BiocFileCache_2.8.0         polyclip_1.10-6            
## [119] Hmisc_5.1-1                 generics_0.1.3             
## [121] Biostrings_2.68.1           mvtnorm_1.2-3              
## [123] parallelly_1.36.0           pkgload_1.3.3              
## [125] statmod_1.5.0               pbapply_1.7-2              
## [127] SummarizedExperiment_1.30.2 gson_0.1.0                 
## [129] utf8_1.2.3                  gower_1.0.1                
## [131] graphlayouts_1.0.1          StanHeaders_2.26.28        
## [133] gridExtra_2.3               shiny_1.7.5                
## [135] lava_1.7.2.1                GenomeInfoDbData_1.2.10    
## [137] RCurl_1.98-1.12             memoise_2.0.1              
## [139] rmarkdown_2.25              downloader_0.4             
## [141] RANN_2.6.1                  stringfish_0.15.8          
## [143] spatstat.data_3.0-1         rstudioapi_0.15.0          
## [145] cluster_2.1.4               QuickJSR_1.0.6             
## [147] rstantools_2.3.1.1          spatstat.utils_3.0-3       
## [149] hms_1.1.3                   fitdistrplus_1.1-11        
## [151] munsell_0.5.0               rlang_1.1.1                
## [153] GenomeInfoDb_1.36.3         ipred_0.9-14               
## [155] circlize_0.4.15             mgcv_1.8-42                
## [157] xfun_0.40                   e1071_1.7-13               
## [159] remotes_2.4.2.1             recipes_1.0.8              
## [161] iterators_1.0.14            matrixStats_1.0.0          
## [163] reldist_1.7-2               abind_1.4-5                
## [165] rstan_2.26.23               treeio_1.24.3              
## [167] rJava_1.0-6                 bitops_1.0-7               
## [169] ps_1.7.5                    promises_1.2.1             
## [171] inline_0.3.19               scatterpie_0.2.1           
## [173] RSQLite_2.3.1               qvalue_2.32.0              
## [175] proxy_0.4-27                fgsea_1.26.0               
## [177] DelayedArray_0.26.7         GO.db_3.17.0               
## [179] compiler_4.3.1              prettyunits_1.2.0          
## [181] listenv_0.9.0               tensor_1.5                 
## [183] MASS_7.3-60                 progress_1.2.2             
## [185] BiocParallel_1.34.2         gridtext_0.1.5             
## [187] babelgene_22.9              spatstat.random_3.1-6      
## [189] R6_2.5.1                    fastmap_1.1.1              
## [191] fastmatch_1.1-4             vipor_0.4.5                
## [193] ROCR_1.0-11                 nnet_7.3-19                
## [195] gtable_0.3.4                KernSmooth_2.23-21         
## [197] miniUI_0.1.1.1              deldir_1.0-9               
## [199] htmltools_0.5.6             RcppParallel_5.1.7         
## [201] bit64_4.0.5                 spatstat.explore_3.2-3     
## [203] lifecycle_1.0.3             zip_2.3.0                  
## [205] processx_3.8.2              callr_3.7.3                
## [207] xlsxjars_0.6.1              sass_0.4.7                 
## [209] vctrs_0.6.3                 spatstat.geom_3.2-5        
## [211] DOSE_3.26.2                 ggfun_0.1.3                
## [213] sp_2.0-0                    future.apply_1.11.0        
## [215] entropy_1.3.1               bslib_0.5.1                
## [217] pillar_1.9.0                gplots_3.1.3               
## [219] jsonlite_1.8.7              GetoptLong_1.0.5